Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

نویسندگان

  • Samuel V. Angiuoli
  • James R. White
  • Malcolm Matalka
  • Owen White
  • W. Florian Fricke
چکیده

BACKGROUND The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. RESULTS We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. CONCLUSIONS Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

Task Scheduling Algorithm Using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in Cloud Computing

The cloud computing is considered as a computational model which provides the uses requests with resources upon any demand and needs.The need for planning the scheduling of the user's jobs has emerged as an important challenge in the field of cloud computing. It is mainly due to several reasons, including ever-increasing advancements of information technology and an increase of applications and...

متن کامل

GASA: Presentation of an Initiative Method Based on Genetic Algorithm for Task Scheduling in the Cloud Environment

The need for calculating actions has been emerged everywhere and in any time, by advancing of information technology. Cloud computing is the latest response to such needs. Prominent popularity has recently been created for Cloud computing systems. Increasing cloud efficiency is an important subject of consideration. Heterogeneity and diversity among different resources and requests of users in ...

متن کامل

GASA: Presentation of an Initiative Method Based on Genetic Algorithm for Task Scheduling in the Cloud Environment

The need for calculating actions has been emerged everywhere and in any time, by advancing of information technology. Cloud computing is the latest response to such needs. Prominent popularity has recently been created for Cloud computing systems. Increasing cloud efficiency is an important subject of consideration. Heterogeneity and diversity among different resources and requests of users in ...

متن کامل

A Near Optimal Approach in Choosing The Appropriate Physical Machines for Live Virtual Machines Migration in Cloud Computing

Migration of Virtual Machine (VM) is a critical challenge in cloud computing. The process to move VMs or applications from one Physical Machine (PM) to another is known as VM migration. In VM migration several issues should be considered. One of the major issues in VM migration problem is selecting an appropriate PM as a destination for a migrating VM. To face this issue, several approaches are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011